Principal Methods

9

In this section, we list several works that modify the structure of [BNNs], contributing to

better performance or convergence of the network. XNOR-Net and Bi-real Net make minor

adjustments to the original networks, while MCN proposes new filters and convolutional

operations. The loss function is also adjusted according to the new filters, which will be

introduced in Section 1.1.5.

1.1.5

Loss Design

During neural network optimization, the loss function is used to estimate the difference

between the real and predicted values of a model. Some classical loss functions, such as

least squares loss and cross-entropy loss, are widely used in classification and regression

problems. This section will review the specific loss function used in [BNNs].

MCNs [236] propose a novel loss function that considers filter loss, center loss, and

softmax loss in an end-to-end framework. The loss function in MCNs consists of two parts:

L = LM + LS.

(1.13)

The first part LM is:

LM = θ

2



i,l

Cl

i ˆCl

i M l2 + λ

2



m

fm( ˆC, ⃗M)¯f( ˆC, ⃗M)

2,

(1.14)

where C is the full precision weights, ˆC is the binarized weights, M is the M-Filters defined

in Section 1.1.4, fm denotes the feature map of the last convolutional layer for the mth

sample, and ¯f denotes the class-specific mean feature map of previous samples. The first

entry of LM represents the filter loss, while the second entry calculates the center loss using

a conventional loss function, such as the softmax loss.

PCNNs [77] propose a projection loss for discrete backpropagation. It is the first to

define the quantization of the input variable as a projection onto a set to obtain a projec-

tion loss. Our BONNs [287] propose a Bayesian-optimized 1-bit CNN model to improve the

performance of 1-bit CNNs significantly. BONNs incorporate the prior distributions of full-

precision kernels, features, and filters into a Bayesian framework to construct 1-bit CNNs

comprehensively, end-to-end. They denote the quantization error as y and the full-precision

weights as x. They maximize p(x|y) to optimize x for quantization to minimize the recon-

structed error. This optimization problem can be converted to a maximum a posteriori since

the distribution of x is known. For feature quantization, the method is the same. Therefore,

the Bayesian loss is as follows:

LB = λ

2

l



l=1

Cl

o



i=1

Cl

i



n=1

{

ˆkl,i

n wlkl,i

n

2

2

+ v(kl,i

n+ μl

i+)Tl

i+)1(kl,i

n+ μl

i+)

+ v(kl,i

nμl

i)Tl

i)1(kl,i

nμl

i)

vlog(detl))} + θ

2

M



m=1

{

fmcm

2

+

Nf



n=1

σ2

m,n(fm,n cm,n)2 + log(σ2

m,n)

},

(1.15)